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Abstract. The purpose of this paper is to synthesize the approaches taken by Chatterjee- 
Meckes and Reinert-RoUin in adapting Stein's method of exchangeable pairs for multivariate 
\ normal approximation. The more general linear regression condition of Reinert-RoUin allows 

1 ^*) ' for wider applicability of the method, while the method of bounding the solution of the Stein 

I equation due to Chatterjee-Meckes allows for improved convergence rates. Two abstract 

normal approximation theorems are proved, one for use when the underlying symmetries of 
[ the random variables are discrete, and one for use in contexts in which continuous symmetry 

' groups are present. The application to runs on the line from Reinert-Rollin is reworked to 

demonstrate the improvement in convergence rates, and a new application to joint value 
' distributions of eigenfunctions of the Laplace-Beltrami operator on a compact Riemannian 

manifold is presented. 
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1. Introduction 



[f'{x)-xf{x)]fi{dx)=0 



In 1972, Charles Stein pOj introduced a powerful new method for estimating the distance 
from a probability distribution on R to a Gaussian distribution. Central to the method was 
the notion of a characterizing operator: Stein observed that the standard normal distribution 
> : was the unique probability distributiou ^ with the property tlrat 

cn 
m 
p . 

psj ■ for all / for which the left-hand side exists and is finite. The operator To defined on 
O I functions by 

TJ{x) = f'{x)-xfix) 

^ I is called the characterizing operator of the standard normal distribution. The left-inverse to 
To, denoted Uo, is defined by the equation 



ToiUof){x) = f{x)-Ef{Z), 

where Z is a standard normal random variable; the boundedness properties of Uo are an 
essential ingredient of Stein's method. 

Stein and many other authors continued to develop this method; in 1986, Stein published 
the book [21j, which laid out his approach to the method, called the method of exchangeable 
pairs, in detail. Stein's method has proved very useful in situations in which local dependence 
or weak global dependence are present. One of the chief advantages of the method is that it 
is specifically a method for bounding the distance from a fixed distribution to Gaussian, and 
thus automatically produces concrete error bounds in limit theorems. The method is most 
naturally formulated by viewing probability measures as dual to various classes of functions, 
so that the notions of distance that arise are those which can be expressed as differences 
of expectations of test functions (e.g., the total variation distance, Wasserstein distance, 
or bounded Lipschitz distance). Several authors (particularly Bolthausen [Ij, Gotze [7J, 
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Rinott and Rotar [T7], and Shao and Su [19]) have extended the method to non-smooth test 
functions, such as indicator functions of intervals in M and indicator functions of convex sets 
in 

Heuristically, the univariate method of exchangeable pairs goes as follows. Let W be 
a random variable conjectured to be approximately Gaussian; assume that KW = and 
KW'^ = 1. From W, construct a new random variable W such that the pair {W, W) has the 
same distribution as {W, W). This is usually done by making a "small random change" in 
W, so that W and W are close. Let A = W — W. If it can be verified that there is a A > 
such that 

(1) E [A\W] = -XW + E^, 

(2) E[A'^\W] =2X + E2, 

(3) ¥.[A\W]=Es, 

with the random quantities Ei,E2, E^ being small compared to A, then W is indeed approx- 
imately Gaussian, and its distance to Gaussian (in some metric) can be bounded in terms 
of the Ei and A. 

While there had been successful uses of multivariate versions of Stein's method for normal 
approximation in the years following the introduction of the univariate method (e.g., by 
Gotze [7], Rinott and Rotar [17], [18], and Raic [14J), there had not until recently been a 
version of the method of exchangeable pairs for use in a multivariate setting. This was first 
addressed in joint work by the author with S. Chatterjee [2J, where several abstract normal 
approximation theorems, for approximating by standard Gaussian random vectors, were 
proved. The theorems were applied to estimate the rate of convergence in the multivariate 
central limit theorem and to show that rank k projections of Haar measure on the orthogonal 
group On and the unitary group IX^ are close to Gaussian measure on M'^ (respectively C'^), 
when k = o{n). The condition in the theorems of [2j corresponding to condition ([1]) above 
was that, for an exchangeable pair of random vectors (X, X'), 

(4) E [X' = -AX. 

The addition of a random error to this equation was not needed in the applications in 
but is a straightforward modification of the theorems proved there. 

After the initial draft of [2] appeared on the ArXiv, a preprint was posted by Reinert 
and Rollin [T6] which generalized one of the abstract normal approximation theorems of [2]. 
Instead of condition (jl]) above, they required 

(5) E [X'-X|X] = -AX + E, 

where A is a positive definite matrix and E is a, random error. This more general condi- 
tion allowed them to estimate the distance to Gaussian random vectors with non-identity 
(even singular) covariance matrices. They then introduced an insightful new method, "the 
embedding method" for approximating real random variables by the normal distribution, by 
observing that in many cases in which the condition ([1]) does not hold, the random variable 
in question can be viewed as one component of a random vector which satisfies condition 
^ with a non-diagonal A. Many examples are given, both of the embedding method and 
the multivariate normal approximation theorem directly, including applications to runs on 
the line, statistics of Bernoulli random graphs, U-statistics, and doubly-indexed permutation 
statistics. 
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After [16] was posted, [2] underwent significant revisions, largely to change the metrics 
which were used on the space of probability measures on M'^ and C'^. As mentioned above. 
Stein's method works most naturally to compare measures by using (usually smooth) classes 
of test functions. The smoothness conditions used by Reinert and RoUin, and those initially 
used in [2], are to assume bounds on the quantities 

\h\r := sup 

l<ii ,...,j,.<fc 

The approach taken in the published version of ^ is to give smoothness conditions instead 
by requiring bounds on the quantities 

Mrih) := sup \\D'hix)\\op, 

where ||D''/i(a;)||op is the operator norm of the r-th derivative of h, as an r-linear form. These 
smoothness conditions seem preferable for several reasons. Firstly, they are more geomet- 
rically natural, as they are coordinate- free; they depend only on distances and not on the 
choice of orthonormal basis of M.^. Particularly when approximating by the standard Gauss- 
ian distribution on R^, which is of course rotationally invariant, it seems desirable to have 
a notion of distance which is also rotationally invariant. In more practical terms, consider- 
ing classes of functions defined in terms of bounds on the quantities Mr and modifying the 
proofs of the abstract theorems accordingly allows for improved error bounds. The original 
bound on the Wasserstein distance from a A;- dimensional projection of Haar measure on 0„ 
to standard Gauss measure from the first version of [2] was c^^, while the coordinate- free 
viewpoint allowed the bound to be improved to (in the same metric). In Section [3] below, 
the example of runs on the line from [16j is reworked with this viewpoint, with essentially 
the same ingredients, to demonstrate that the rates of convergence obtained are improved. 
Finally, most of the bounds in [2] and below, and those from the main theorem in [16] require 
two or three derivatives, so that an additional smoothing argument is needed to move to one 
of the more usual metrics on probability measures (e.g. Wasserstein distance, total variation 
distance, or bounded Lipschitz distance). Starting from bounds in terms of the Mr{h) instead 
of the \h\r typically produces better results in the final metric; compare, e.g.. Proposition 

3.2 of the original ArXiv version of the paper [13] of M. Meckes with Corollary 3.5 of the 
published version, in which one of the abstract approximation theorems of |2j was applied 
to the study of the distribution of marginals of the uniform measure on high- dimensional 
convex bodies. 

The purpose of this paper is to synthesize the approaches taken by the author and Chat- 
terjee in [2] and Reinert and Rollin in In Section [21 two preliminary lemmas are proved, 
identifying a characterizing operator for the Gaussian distribution on with covariance ma- 
trix S and bounding the derivatives of its left-inverse in terms of the quantities Mr. Then, 
two abstract normal approximation theorems are proved. The first is a synthesis of Theorem 

2.3 of [2j and Theorem 2.1 of [16], in which the distance from X to a Gaussian random vari- 
able with mean zero and covariance S is bounded, for X the first member of an exchangeable 
pair {X, X') satisfying condition above. The second approximation theorem is analogous 
to Theorem 2.4 of [2], and is for situations in which the underlying random variable possesses 
"continuous symmetries". A condition similar to is used in that theorem as well. Finally, 
in Section [31 two applications are carried out. The first is simply a reworking of the runs 
on the line example of |T6], making use of their analysis together with Theorem [3] below to 
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obtain a better rate of convergence. The second application is to the joint value distribution 
of a finite sequence of orthonormal eigenfunctions of the Laplace-Beltrami operator on a 
compact Riemannian manifold. This is a multivariate version of the main theorem of 
As an example, the error bound of this theorem is computed explicitly for a certain class of 
flat tori. 

1.1. Notation and conventions. The Wasserstein distance dw{X, Y) between the random 
variables X and Y is defined by 

dw{X,Y)= sup \EgiX)-Eg{Y)\, 

Mi{g)<l 

where Mi{g) = sup^.^^^ is the Lipschitz constant of g. On the space of probabil- 

ity distributions with finite absolute first moment, Wasserstein distance induces a stronger 
topology than the usual one described by weak convergence, but not as strong as the topol- 
ogy induced by the total variation distance. See [1] for detailed discussion of the various 
notions of distance between probability distributions. 

We will use 9^(yU, S) to denote the normal distribution on M'^ with mean fi and covariance 
matrix E; unless otherwise stated, the random variable Z = {Zi, . . . , Z^) is understood to 
be a standard Gaussian random vector on M'^. 

In M", the Euclidean inner product is denoted (■,■) and the Euclidean norm is denoted 
I ■ |. On the space of real n x n matrices, the Hilbert-Schmidt inner product is defined by 

{A,B)^,=TTiAB^), 

with corresponding norm 

\\A\\h.s. = VTr(AA^). 
The operator norm of a matrix A over M is defined by 

II A II op = sup I {Av, w) \ . 

\v\=l,\w\=l 

More generally, if A is a fc-linear form on M", the operator norm of A is defined to be 

ll^llop = SUp{|v4(Mi, . . .,Uk)\ : |mi| = ■ ■ ■ = \Un\ = 1}. 

The n X n identity matrix is denoted /„ and the n x n matrix of all zeros is denoted On- 

For Q a domain in M", the notation C^{Q) will be used for the space of fc-times continuously 
differentiable real- valued functions on Q, and C^{Q) C C^(f2) are those functions on Q 
with compact support. The k-th derivative D^f{x) of a function / G C^(M"') is a /c-linear 
form on M", given in coordinates by 

{D''f{x),{Ui,...,Uk)) = Yl Q^. ...Q^. (^)(^l)n---K)»fc^ 

n,...,ife=l *^ 

where {ui)j denotes the j-th component of the vector Wj. For an intrinsic, coordinate- free 
developement, see Federer |5]. For / : — >■ R, sufficiently smooth, let 

(6) Mfc(/):=sup||DV(a;)||op. 
In the case k = 2, define 



(7) M2(/) := sup ||Hess/(x)||H.5.. 
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Note also that 

that is, Mk{f) is the Lipschitz constant of the k — 1-st derivative of /. 

This general definition of Mk is a departure from what was done by Raic in [15]; there, 
smoothness conditions on functions are also given in coordinate-independent ways, and Mi 
and M2 are defined as they are here, but in case k = 3, the quantity M3 is defined as the 
Lipschitz constant of the Hessian with respect to the Hilbert-Schmidt norm as opposed to 
the operator norm. 

2. Abstract Approximation Theorems 

This section contains the basic lemmas giving the Stein characterization of the multivariate 
Gaussian distribution and bounds to the solution of the Stein equation, together with two 
multivariate abstract normal approximation theorems and their proofs. The first theorem 
is a reworking of the theorem of Reinert and RoUin on multivariate normal approximation 
with the method of exchangeable pairs for vectors with non-identity covariance. The second 
is an analogous result in the context of "continuous symmetries" of the underlying random 
variable, as has been previously studied by the author in [12], [H], and (jointly with S. 
Chatterjee) in [2]. 

The following lemma gives a second-order characterizing operator for the Gaussian distri- 
bution with mean and covariance S on M'^. The characterizing operator for this distribution 
is already well-known. The proofs available in the literature generally rely on viewing the 
Stein equation in terms of the generator of the Ornstein-Uhlenbeck semi-group; the proof 
given here is direct. 

Lemma 1. Let Z ^M."^ be a random vector with {Zi}f^^ independent, identically distributed 
standard Gaussian random variables, and let Zy, = Y}I'^Z for a symmetric, non-negative 
definite matrix S. 

(1) If f :M.^ is two times continuously differentiable and compactly supported, then 

E [ (Hess /(Zs), S)^ ^ - (Zs, V/(Zs)) ] = 0. 

(2) IfY eW'- is a random vector such that 

E[(Hess/(r),s)^,^,-(y,v/(r))] =0 

for every f G C^{W^) with E| (Hess /(F), E)^^ - (F, V/(F)) | < 00, then L{Y) = 

(3) Ifge C°^(M°'), then the function 

/•I 1 

(8) Uog{x):= -Ug{Vtx + Vl^tZY)-¥.g{ZY)\dt 

Jo ^t 

is a solution to the differential equation 



(9) 



(x, Vh{x)) — (Hess /i(x), ^ s = 9{^) ^ ^giZj]). 
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Proof. Part ([T]) follows from integration by parts. 

Part (ED follows easily from part ([3]): note that if E[ (Hess f{Y), S)^^ - {Y, V/(r)) ] = 
for every / G ^^(R'^) with E| (Hess f(Y), S)^ ^ - {¥, V/(F)) | < oo, then for g e given, 

Eg{Y) - Eg{Z) = E[ (Hess {Uog){Y), S)^.^. - {Y, V{Uog){Y)) ] = 0, 

and so ^(V) = '^(Z) since C°° is dense in the class of bounded continuous functions, with 
respect to the supremum norm. 

For part ([3]), first note that since g is Lipschitz, if t G (0, 1) 



- [Kg{Vtx + v^r^Si/'Z) - Eg{^^/^Z)] 



< —E 
- 2t 



Vix + {VT^-1)E^/'^Z 



< — 
- 2t 



Vi\x\ + tv^Tr (S) 



which is integrable on (0, 1), so the integral exists by the dominated convergence theorem. 
To show that Uog is indeed a solution to the differential equation ([9]), let 

Z^^t = Vix + Vl - tT}l'^Z 

and observe that 

g{x)-Eg{T}/^Z) = j jEg{Z,,t)dt 



j'^ ^E{x ■ Vg{Z,))dt - ^^7i=|E {J:'/'Z, Vg{Z,)) dt 



2v^ 

by integration by parts. Noting that 

Hess (Uog){x) 

and 

x-V{Uog){x) - 

completes part [3l 



E{x ■ Vg{Zt))dt 



-E(Hess5f(Zt),S)^5 dt 



^E[Hess^(Zi)]dt 



E{x ■ Vg{Zt))dt 



2Vt 



□ 



The next lemma gives useful bounds on Uog and its derivatives in terms of g and its 
derivatives. As in [15], bounds are most naturally given in terms of the quantities Mi{g) 
defined in the introduction. 



Lemma 2. For g : 

(1) 

(2) 



given, Uog satisfies the following bounds: 



MkiUog) < -Mk{g) VA; > 1. 



M^iUog) < -M,{g). 



If, in addition, E is positive definite, then 
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(3) 



(4) 



(5) 



V TT 



op- 



Remark: Bounds ([3]), (jl]), and (j^]) are mainly of use when E has a fairly simple form, since 
they require an estimate for ||S~-'^/^||op. They are also of theoretical interest, since they show 
that if S is non-singular, then the operator Uo is smoothing; functions UoQ are typically one 
order smoother than g. The bounds ([1]) and ([2]), while not showing the smoothing behavior of 
Uo, are useful when S is complicated (or singular) and an estimate of ||S~^/^||op is infeasible 
or impossible. 

Proof of Lemma\^ Write h{x) = Uog{x) and Z^^t = Vtx + y/l — tT^^'^Z. Note that by the 
formula for Uog, 

•1 



(10) 
Thus 



(2t)-H''/2E 



&g 



dt. 



[D\Uog){x),{u^,...,Uk)) 



¥.[{D''g{Z,^t),{uu . . . ,uu))]dt 



for unit vectors Mi, . . . , m^, and part ([T]) follows immediately. 
For the second part, note that (fTOj) implies that 

• 1 



Hess h{x) 



E[Hess(7(Z,,t)]dt. 



Fix a d X d matrix A. Then 



\{}iessh{x),A)^g \ < E\{Hess g{Z^^t), A) \ dt < ^ ^sup ||Hess5((x) 
hence part ([2]). 

For part ([3]), note that it follows by integration by parts on the Gaussian expectation that 

• 1 1 



dh 
dxi 



\x 



2v^ 



E 



^i^^tx + VT^tT}'^Z) 

OXi 



dt 



['L-^'^Z),g{^tx + Vl^tT}/^Z) 



thus 



2Vt(l-t) 

Vh{x) = f -^=L=E [g{Z,,)T.-''^Z] dt, 
Jo 2^/1(1 — t) 



dt. 
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-.dt. 



lo 2^yt{i - 1) 

Now, E|S-i/2^| < = \\^-^/^opsJl, since ^-^/^Z is a univariate Gaussian 

random variable, and = |. This completes part ([3]). 



'0 2v/t(l-t) 

For part (jlj), again using integration by parts on the Gaussian expectation. 



(11) 



dxidxj 



1e 

2 



dt 



and so 

(12) Hess/i(x) = 

Fix a. d X d matrix A. Then 
(Hess h{x), A)j^g 

thus 



2^1^ 



E 



dt. 



dt. 



2^V^t 



E[(A^E-i/2z, Vg{Z,,))\ dt, 



\{}^essh{x),A)^,\<M^ig)E\A^J:~'/'Z\ [ -^=dt = M,{g)E\A^^-'/^Z\. 

Jo 2v i. — t 



As above. 



It follows that 



\\Ress h{x)\\H.s. < \ -Mi{g)\\i: 



-1/2, 



for all a; G M'^, hence part (jl]). 

For part ([5]), let m and v be fixed vectors in M°' with |m| = |f | = 1. Then it follows from 
dn]) that 

((Hess h{x) - Hess h{y)) u, v) = ^^7^=^^ [{^-'/^Z, v) {Vg{Z,^t) - Vg{Zy^t),u)] dt, 



and so 



|((Hess/i(x) -Hess/i(?/))u,t;)| < \x-y\M2{g)E\ (Z, S^^/^) 

= \x-y\M,ig)\i:~'/\"' 



2y/T^ 



-.dt 



4 



< \x-y\M2{g)\\j:-'/'\ 



271 



op 4 



□ 
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Theorem 3. Let {X,X') be an exchangeable pair of random vectors in M'^. Suppose that 
there is an invertible matrix A, a symmetric, non-negative definite matrix!], a random vector 
E and a random matrix E' such that 



(1) 
(2) 



E [X' - X\X] = -AX + E [E\X] 
E [(X' - X)(X' - X)^|X] = 2AS + E [E'\X] . 



Then for g G C3(M'^), 
(13) 

\Eg{X)-Eg{J:'^^Z)\ < \\A-Xp 



< IIA-^I 



op 



Mi{g)E\E\ + ^M2{g)E\\E'\\H.s. + ^M3(^)E|X' - Xf 
M,ig)E\E\ + ^M2ig)E\\E'\\H.s. + ^Ms{g)E\X' - Xf 



where Z is a standard Gaussian random vector in R'^. 
//S is non-singular, then for g G C^(M'^), 



EgiX) -EgiJ:'^'Z) < M^ig)\\A- 



(14) 



E\E\ 



+ - — M2 
24 ^ 



2 

5.-1/2 



S "'"^^||opIE||£''||iy.5. 



\op\\A """llopElX' — X|^. 



Proof. Fix g, and let Uog be as in Lemma[Tl Note that it suffices to assume that g G C°°(]R'^): 
let /i : M"^ — > M be a centered Gaussian density with covariance matrix e^Id- Approximate g 
by g * h] clearly \\g * h — g\\oo ^ as e — > 0, and by Young's inequality, Mk{g * h) < M^i^g) 
for all > 1. 

For notational convenience, let / = Uog- By the exchangeability of (X, X'), 
= ^E KA"'(^' - X), V/(X') + V/(X)>] 



E 
E 



-[{A-\X' - X), V/(X') - V/(X)> + {A-\X' - X), V/(X)> 
\ (Hess f{XlA-\X' - X)(X' - X)^>^^ + {A-\X' - X), V/(X)> + f 



where R is the error in the Taylor approximation. By conditions ([I]) and ([2]), it follows that 



= E 



(Hess/(X),S)^.5. - (X, V/(X)) + \ (Hess /(X), A'^E')^ + (V/(X), A-^i^;) + | 



that is (making use of the definition of /), 

'1 



(15) E^(X) - Eg{Y}I^Z) = E 



Hess/(X), A-ii?%^ + (V/(X), A-iE) + 



R 
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Next, 



E 



-(Hess/(X),A"iE%^^ 



< - ( sup ||Hess/(x) 11^^.5. ) \\A-'E'\\h.s. 



< i ( sup ||Hess/(x)||H.5. ) ||A lop||^'lk.5. 

^-\\A-X,\\E'\\h.s. |^min|iM2(^?),y|Mi((7)||S-i/l,p|j , 



< - 
- 2 



where the first hne is by the Cauchy-Schwarz inequahty, the second is by the standard bound 



\AB\\h.s. < 
Similarly, 



op 



\B\\h.s., and the third uses the bounds ([2]) and (jl]) from Lemma El 



E\{Vf{X),A-'E)\ <M,{f)\\A-XpE\E\ 

< \\A-%p^\E\ (^min|Mi((7),y|M„((?)||S-i/2||„p|^ . 



Finally, by Taylor's theorem and Lemma [2], 



\R\ < 



1 



X'-X\ \A-^{X'-X)\ < -\\A-^\\op\X'-X\ 



mm 



1m3(^?),^M,(^)||S-V2| 



op 



The first bound of the theorem results from choosing the first term from each minimum; the 
second bound results from the second terms. 

□ 

Theorem 4. Let X he a random vector in and, for each e G (0, 1), suppose that (X, X^) 
is an exchangeable pair. Suppose that there is an invertible matrix A, a symmetric, non- 
negative definite matrix S, a random vector E, a random matrix E' , and a deterministic 
function s(e) such that 

(1) 



1 



E [X' - X|X] -AX + E [E\X] 



(2) 



(3) For each p > 0, 



E [(X'-X)(X'-X)'^|X] 



2AS + E [^'|X] . 



lim— - 

e^o s(e) 



E [|Xe -XpIdXe -Xp > p 



0. 



Then for g G C\R'^), 

\Eg{X)-Eg{J:'/^Z)\ < \\A-Xp 



(16) 



< IIA-^| 



op 



M^{g)E\E\ + -M2{g)E\\E'\\H.s. 
M^{g)E\E\ + —M2{g)E\\E'\\n.s. 



where Z is a standard Gaussian random vector in M . 
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(17) dw{x,j:^/^z) <\\A-^\ 
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I op 



E\E\ + ^||S"^/2||opE||E'| 



H.S. 



Proof. Fix g, and let Uog be as in Lemma [TJ As in the proof of Theorem [3l it suffices to 
assume that g G C°°(M'^). 

For notational convenience, let / = Uog. Beginning as before, 



2s(e) 



E [{A-\X, - X), V/(X,) + V/(X)>] 



1 

7(7) 



-E 

I 

-E 



-[{A-\X, - X), V/(X,) - V/(X)> + {A'\X, - X), V/(X)> 
1 (Hess /(X), A-i(X, - X)(X, - X)^>^^ + {A-\X, - X), V/(X)> + | 



where R is the error in the Taylor approximation. 

Now, by Taylor's theorem, there exists a real number K depending on /, such that 

\R\ < irmin{|X, -XHA-^(X, -X)|, |X, - X||A-^(X, - X)|} 
< K\\A-^\\opmm{\X, - X\^, |X, -Xj^} 

Breaking up the expectation over the sets on which \X^ — Xp is larger and smaller than a 
fixed p > 0, 

1 K\\A~^\ 



s{e) 



E\R < 



I op 



.(e) 



E 



\X, - X|'I(|X, - X| < p) + |X, - X|'I(|X, - X| > p) 



< 



K\\A~'\\oppE\X, - X\ K\\A 



-11 



+ 



I op 



E 



|X, -XpI(|X'-X| > p) 



The second term tends to zero as e — > by condition [3l condition [2] implies that the first is 
bounded by C-ft'||A~^||opp for a constant C depending on the distribution of X. It follows 
that 

lim^Eli?! = 0. 



For the rest of (fTSD. 



lim- 



-E 



E 



- (Hess/(X),A-i(X, -X)(X, -X)^)^ ,.^ + (A-i(X, - X), V/(X)> 



(Hess/(X),S)^.^. - (X, V/(X)) + - (Hess /(X), A-^E')^ + (V/(X), A-^E) 



where conditions ([T]) and ([2]) together with the boundedness of Hess / and V/ have been 
used. That is (making use of the definition of /), 



(19) Eg{X) -Eg{J:^/^Z) =E 



(Hess/(X), A-iE%^ + (V/(X), A-^i?) 
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As in the proof of Theorem [3l 
E 



and 

E\{VfiX),A-'E)\ < \\A-'\lpE\E\M,{g). 
This completes the proof. 



□ 



Remarks: 

(1) Note that the condition 

(3') hm,^o^E|X,-X|' = 0, 

is stronger than condition (3) of Theorem [Hand may be used instead; this is what is 
done in the apphcation given in Section [31 

(2) In [16], singular covariance matrices are treated by comparing to a nearby non- 
singular covariance matrix rather than directly. However, this is not necessary as 
all the proofs except those explicitly involving E~^/^ go through for non- negative 
definite S. 



3. Examples 

3.1. Runs on the line. The following example was treated by Reinert and Rollin [16] as an 
example of the embedding method. It should be emphasized that showing that the number 
of (i-runs on the line is asymptotically Gaussian seems infeasible with Stein's original method 
of exchangeable pairs because of the failure of condition ([1]) from the introduction, but in 
[16] , the random variable of interest is embedded in a random vector whose components can 
be shown to be jointly Gaussian by making use of the more general condition (jSj) of the 
introduction. The example is reworked here making use of the analysis of [TB] together with 
Theorem [31 yielding an improved rate of convergence. 

Let Xi, . . . ,Xn be independent {0, l}-valued random variables, with P(Xj = 1) = p and 
P(Xj = 0) = 1 — p. For d > 1, define the (centered) number of ci-runs as 

n 

Vd '■= {XmXm+l ■ ■ ■ Xm+d-1 — P'^) , 

m=l 

assuming the torus convention, namely that Xn+k = X^ for any k. For this example, we 
assume that d < ^. To make an exchangeable pair, d — 1 sequential elements of X := 
(Xi, . . . , Xn) are resampled. That is, let / be a uniformly distributed element of {1, ... , n} 
and let X(,...,X^ be independent copies of the Xj. Let X' be constructed from X by 
replacing X/, . . . , X/+d_2 with X|, . . . , X|_^^_2- Then (X, X') is an exhangeable pair, and. 
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defining ¥( := Vi{X) for z > 1, it is easy to see tliat 

I+d-2 I+d-2 

K' — Vi = — ^ Xm ■ ■ ■ Xm+i-1 + ^ X'^ ■ ■ ■ Xj^^_2Xi+d-l 
/ N m=I—i+l m=I+d—i 

^ ' I+d-i-1 I-l 

+ ■ ■ ■ ^'m+i-1 + - ■■ Xj^iX'j ■ ■ ■ X^+j_i 

m=I m=I—i+l 

wliere sums taken to be zero if a > 6. It follows that 



m+i— 1 



¥.[Vl-Vi\X] 



1 

n 



k=l 



Standard calculations show that, for 1 < j < -j < ci 
(21) 



n 



k=l 



np 



k=0 



In particular, it follows from this expression that np\l—p) < E,V^ < np*(l— suggesting 
the renormalized random variables 



(22) 



V,. 



^Jnp^il -p) 
It then follows from (I2T1) that, for 1 < i, j < d, 

ihj-l 

(23) 



a,,:=¥.[WiWj\ = p^ ^ (^ - j| + 1 + 2fc)p^ 



A;=0 



and from ([20]) that iiW:={Wi,...,Wd), then ¥.\W' -W\X]= KW, where 

d-l 

-2ph d 



A 



n 



— 2p 2 



-2p— 



-2p2 + A; - 2 



-2p5 2{d-l] 



Condition ([T]) of Theorem [3] thus applies with E = Q and A as above. 

To apply Theorem [3], an estimate on ||A^^||op is needed. Following Reinert and Rollin, 
we make use of known estimates of condition numbers for triangular matrices (see, e.g., the 
survey of Higham |9j). First, write A =: AeAd, where Ai^ is diagonal with the same diagonal 
entries as A and A^; is lower triangular with diagonal entries equal to one and {AE)ij = ^ 

for i > j. Note that all non-diagonal entries of A^; are bounded in absolute value by 
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From Lemeire [TO], this implies the bounds 

l|Ai'll.<(l + |^)"' and ||A-|U<(l + |^)"' 

From Higham, ||Ap^||op < \/ ||Ar;^||i||A^^||oo, thus 



Wop 

d-1 



\A-^\\ < 

M^E Hop _ 



Trivially, ||A^^||op = j^, and thus 



d 



Now observe that, if condition ([T]) of Theorem [3] is satisfied with E = 0, then it follows 
that E [{W - W){W' - Wf] = 2AS, and thus we may take 

E' := E [{W - W){W' - Wf - 2AJ:\W] . 

It follows that 



It was determined by Reinert and RoUin that 

05^5 

Var (E [iW: - W.){W; - W,)\W]) < 

thus 

Finally, note that 

d 

¥.\W' - W^l^ < Vd^¥\W[ - Wi\^. 

i=l 

Reinert and RoUin showed that 

E\{w^-w^i)iw;-Wj)iw',-Wky ^"^^ 



< 



for all i,j, k, thus 

E\W' - < 



^3/2p3d/2^l _p)3/2- 

Using these bounds in inequality (fT3ll from Theorem [3] yields the following. 

1 d 



Theorem 5. For W = (Wi, . . . , Wd) defined as in ([22]) with ci < |, S = [o-jj] . given by 
([23D, and h G C^iR'^), 



(25) \Eh{W) -E/i(S^/2^)| < 



where Z is a standard d- dimensional Gaussian random vector. 



15^/Qd^M2{h) _^ AOd'^/^Msih) 
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Remarks: Compare this result to that obtained in [16] : 

'il£l^\h\2 



(26) 
where \h\2 



\m{W) -¥.h{T}/^ Z)\ < 



sup. 



dxidxi 



and \h\ 



sup 



03 /i 



+ 



,3rf/2 



;i-p)3/2^' 



3.2. Eigenfunctions of the Laplacian. Consider a compact Riemannian manifold M with 
metric g. Integration with respect to the normalized volume measure is denoted dvol, thus 



Jj^ Idvol = 1. For coordinates |^| on M, define 



d_ 

dxi 



d 



dxi 



g{x) 



Define the gradient V/ of / : M - 

d 



V/(x) 



det(G'(x)) 

and the Laplacian Agf of / by 

d 



dxk 



1 d 



Vgg- 



dxk 



The function / : M ^ M is an eigenfunction of A with eigenvalue —fi if A/(x) = —fif{x) 
for all X G M; it is known (see, e.g., [3]) that on a compact Riemannian manifold M, the 
eigenvalues of A form a sequence > —fii > —^2 > • • • \ —00. Eigenspaces associated 
to different eigenvalues are orthogonal in L2{M) and all eigenfunctions of A are elements of 
C°°(M). 

Let X be a uniformly distributed random point of M. The value distribution of a function 
/ on M is the distribution (on M) of the random variable f{X). In [TT], a general bound was 
given for the total variation distance between the value distribution of an eigenfunction and 
a Gaussian distribution, in terms of the eigenvalue and the gradient of /. The proof made 
use of a univariate version of Theorem HI Essentially the same analysis is used here to prove 
a multivariate version of that theorem. 

Let /i, . . . , /fc be a sequence of orthonormal (in L2) eigenfunctions of A with corresponding 
eigenvalues — /ij (some of the /ij may be the same if the eigenspaces of M have dimension 
greater than 1). Define the random vector ly G by Wj := fi{X). We will apply Theorem 
mto show that W is approximately distributed as a standard Gaussian random vector (i.e., 

s = 4). 

For e > 0, an exchangeable pair {W, W^) is constructed from W as follows. Given X, 
choose an element V G SxM (the unit sphere of the tangent space to M at X) according to 
the uniform measure on SxM, and let X^ = expj^(ey). That is, pick a direction at random, 
and move a distance e from X along a geodesic in that direction. It was shown in [11] that 
this construction produces an exchangeable pair of random points of M; it follows that if 
:= (/i(Xj, . . . , /fc(Xj), then (VF, VTJ is an exchangeable pair of random vectors in R^. 

In order to identify A, E and E' so as to apply Theorem HJ first let 7 : [0, e] M be 
a constant-speed geodesic such that 7(0) = X, 7(e) = X^, and 7'(0) = V. Then applying 
Taylor's theorem on R to the function /« o 7 yields 



/,(X,)-/,(X)=e 



d{f^ o 7) 



(27) 



dt 



t=o 



+ 2 



dxfi{V) + 



df^ 

7) 



O(e^) 



t=o 



dt^ 



i=0 
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where the coefficient imphcit in the 0{e^) depends on /j and 7 and d^fi denotes the differen- 
tial of fi at X. Recall that d^fi^v) = (V/j(x), v) for v G T^M and the gradient Vfi{x) defined 
as above. Now, for X fixed, V is distributed according to normalized Lebesgue measure on 
SxM and dxfi is a linear functional on TxM. It follows that 

E [dxMV)\X] = E [dxM-V)\X] = -E [dxMV)\X] , 
thus E [dxfi{V)\X] = 0. This implies that 

limiE[/,(X,)-/,(X)|X] 

exists and is finite; we will take s(e) = e^. Indeed, it is well-known (see, e.g.. Theorem 11.12 
of [8]) that 

(28) limiE[/,(X,) - MX)\X] = ^A,/,(X) = ^/.(X) 

for n = dim{M). It follows that A = ^diag{fii, . . . , fj,k) and E' = 0. The expression 
E [W, - l^l^y] satisfies the Li convergence requirement of Theorem IH since the fi are 
necessarily smooth and M is compact. Furthermore, it is immediate that ||A^^||op = 

2nmaxi<i<fc . 

For the second condition of Theorem HJ it is necessary to determine 

lhniE[(iy, - WUW, - W),\X] = limiE[(/,(X,) - - /,(X))|X]. 

By the expansion ( l27l) . 

E [(MX,) - /,(X))(/,(X,) - /,(X))|X] = e^E [{dxMV)){dxfj{V))\X] + 0{e'). 

Choose coordinates |^| in a neighborhood of X which are orthonormal at X. Then 

V/(A-) ^^'f ' 



dxi dxi ' 

for any function / G C^{M), thus 

(djiiv)) ■ (dj.iv)) = (V/„t;) {Vfj,v) 

r=l r^s 

Since V is uniformly distributed on a Euclidean sphere, E[V^K] = \^rs- Making use of this 
fact yields 

\^x^^E{{dxim){dxiAy))\A = -(v/,(x),v/,(x)), 

thus condition (2) is satisfied with 



n L 



(V/.(X),V/,(X)) 



k 

-2A. 



(As before, the convergence requirement is satisfied since the /j are smooth and M is com- 
pact.) 
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By Stokes' theorem, 

E(V/,(X),V/,(X)) = -E[/,(X)A,/,(X)] =/i,E[/,(X)/,(X)] =Mi, 

thus 
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1. 



E\\E'\\h.s. = -E 



n 



J2 [(V/.(X),V/,(X))-E(V/.(X),V/,(X)) 



Finally, fl27j) gives immediately that 

E [\W, - Wf\W] = O(e^), 

(where the implicit constants depend on the and on k), thus condition ([3]) of Theorem H] 
is satisfied. 

All together, we have proved the following. 

Theorem 6. Let M be a compact Riemannian manifold and /i, . . . , an orthonormal (in 
L2{M)) sequence of eigenf unctions of the Laplacian on M, with corresponding eigenvalues 
— /ij. Let X he a uniformly distributed random point of M. Then ifW := (/i(X), . . . , fk{X)), 



dw{W,Z) < 



max 

l<i<k J 



E, 



J2 [(V/.(X),V/,(X))-E(V/.(X),V/,(X)) 



Example: The torus. 

In this example. Theorem O is applied to the value distributions of eigenfunctions on fiat 
tori. The class of functions considered here are random functions; that is, they are linear 
combinations of eigenfunctions with random coefficients. 

Let {M,g) be the torus T*^ = M^/Z", with the metric given by the symmetric positive- 
definite bilinear form B: 

ix,y)B = {Bx,y) . 

With this metric, the Laplacian on T" is given by 

d'f 



dxjdxk 



[X . 



Eigenfunctions of are given by the real and imaginary parts of functions of the form 



fv{x) 



^2ni{Bv,x) 



for vectors v G ffi" such that Bv has integer components, with corresponding eigenvalue 
-fiy = -{27r\\v\\Bf. 

Consider a collection of k random eigenfunctions {fj}j^i of A^ on the torus which are 
linear combinations of eigenfunctions with random coefficients: 



2TTi{Bv,x) 



where Vj is a finite collection of vectors v such that Bv has integer components and {v, Bv) = 
for each v e Vj, and {{av}veVj : 1 < j < ^} are k independent random vectors (indexed 

by j) on the spheres of radius V2 in M'"^^'. Assume that v + w for v G Vr and w G Vg 
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(r and s may be equal) and that fl = for r 7^ s; it follows easily that the fj are 
orthonormal in L2{T"'). 

To apply Theorem E], first note that 



2ni{Bv,x) 



— $5 j {2'K)a^e 



2TTi{Bv,x) 



V 



using the fact that B is symmetric. 
It follows that 

(29) 



2 



)b{' 



\weVs 



^2iTi{Bv-Bw,x) _ 2m{Bv+Bw,x)^ 



Let X be a randomly distributed point on the torus. Let denote averaging over the 
coefficients and Ex denote averaging over the random point X. To estimate ^^adwiW, Z) 
from Theorem [6], first apply the Cauchy-Schwartz inequality and then change the order of 
integration: 



E„E 



[(V/.(X),V/,(X))^-Ex(V/.(X),V/,(X))^] 



< 



^ ExE, [ (V/,(X), V/,(X))^ - Ex (V/,(X), V/,(X))^ 



Start by computing ExE„ (VB/r.(X), V_b/s(X))^ . From above, 

{VBfrix),VBfsiX))l 



27r^3fJ 



^2TTi{Bv-Bw-Bv'+Bw',x) _ ^2m{Bv-Bw-Bv' -Bw' ,x) _|_ ^2m{Bv-Bw+Bv' -Bw' ,x) 
_ ^2m{Bv-Bw+Bv' +Bw' ,x) _ ^2ni{Bv+Bw~Bv' +Bw' ,x) _|_ ^2Tri{Bv+Bw-Bv' -Bw' ,x) 



^2iTi{Bv+Bw+Bv' -Bw' ,x) _|_ ^2m{Bv+Bw+Bv' +Bw' ,x) 
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Averaging over the coefficients {a^} using standard techniques (see FoUand [6J for general 
formulae and [TT] for a detailed explanation of the univariate version of this result), and then 
over the random point X G T", it is not hard to show that 



EA||Vb/,(X)||^ 



\^r\{\Vr\+2) 



and 



Wa {VBfr{X),VBfs{X)) 



47r^ 



B 



\Vr\\Vs 



E 



2 

B ■ 



veVr 
W&Vs 



Now, 



and 



Ex||Vb/.(X)||| 



2-^E«^l 



-III 



(2vr) 



for r ^ s. It follows that 

ExEa\\VBfr{X)\\% - (ExIIVb/. (X)lll)' < 



V.|(|V,| + 2) 

Ex (Vb/.(X),Vb/s(X))^ = 

2(27r)' 



E 



IB 



|VJ(|VJ+2) 



E 



2 



and, applying Theorem [6], we have shown that 

Theorem 7. Let the random orthonormal set of Junctions {fr}^.^i he defined on T" as above, 
and let the random vector W he defined by Wi := fi{X) for X a random point ofT^. Then 



An' 



mm,, fir 



\ 



E 



' |VJ|VJ ^ 

weVs 



E ^^'^^ 



Remarks: Note that if the elements of U^^iVr are mutually orthogonal, then the right- 
hand side becomes 



E 



2fir 



thus if it is possible to choose the Vr such that their sizes are large for large n, and the range 
of the is not too big, the error is small. One can thus find vectors of orthonormal eigen- 
functions of T" which are jointly Gaussian (and independent) in the limit as the dimension 
tends to infinity, if the matrix B is such that there are large collections of vectors v which 
are "close to orthogonal" and have the same lengths with respect to {■,-)b cind with the 
vectors Bv having integer components. It is possible to extend the analysis here, in a fairly 
straightfoward manner, to require rather less of the matrix B (essentially all the conditions 
here can be allowed to hold only approximately), but for simplicity's sake, we include only 
this most basic version here. The univariate version of this relaxing of conditions is carried 
out in detail in llll. 
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